Systems and methods for configuring networks

ABSTRACT

The disclosed computer-implemented method may include (i) generating a data center constraint model by placing a constraint on a total amount of ingress or egress traffic a service expects from each respective data center of multiple data centers, (ii) filtering a set of traffic matrices that indicate points in the data center constraint model by comparing the set of traffic matrices against cut sets of a network topology that indicate network failures to create a tractable set of dominating traffic matrices, (iii) obtaining physical network resources to implement a cross-layer network upgrade architecture that satisfies the tractable set of dominating traffic matrices, and (iv) allocating the physical network resources across the multiple data centers according to the cross-layer network upgrade architecture such that a capacity level of the multiple data centers is increased while satisfying the data center constraint model. Various other methods, systems, and computer-readable media are also disclosed.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims priority to U.S. provisional application63/165,691, filed Mar. 24, 2021, titled “SYSTEMS AND METHODS FORCONFIGURING NETWORKS,” which is incorporated by reference herein in itsentirety.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings illustrate a number of exemplary embodimentsand are a part of the specification. Together with the followingdescription, these drawings demonstrate and explain various principlesof the present disclosure.

FIG. 1 is a flow diagram of an example method for configuring networks.

FIG. 2 is a block diagram of an example system for configuring networks.

FIG. 3 is a diagram of a related methodology for configuring networksbased on a single traffic matrix.

FIG. 4 is a diagram of a methodology for configuring networks based on ahose-based approach rather than a pipe-based approach.

FIG. 5 is a diagram of an example convex polytope that corresponds to adata center constraint model.

FIG. 6 is a diagram of an example workflow for generating referencetraffic matrices as inputs to a cost optimizer formulation.

FIG. 7 is a diagram of example cut sets across a network topology.

FIG. 8 is a diagram that illustrates a sweeping methodology forgenerating cut sets.

FIG. 9 is an integer linear programming formulation for filteringtraffic matrices.

FIG. 10 is a series of diagrams that illustrate differences betweensimple cut sets and complicated cut sets.

Throughout the drawings, identical reference characters and descriptionsindicate similar, but not necessarily identical, elements. While theexemplary embodiments described herein are susceptible to variousmodifications and alternative forms, specific embodiments have beenshown by way of example in the drawings and will be described in detailherein. However, the exemplary embodiments described herein are notintended to be limited to the particular forms disclosed. Rather, thepresent disclosure covers all modifications, equivalents, andalternatives falling within the scope of the appended claims.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

Modern technology companies often utilize large or global computingnetworks to provide one or more services to users or customers. Forexample, a social networking technology company may provide a globalcross-layer backbone network between a multitude of data centers toprovide such services. As illustrative examples, such services mayinclude messaging services, photo storage services, and/or backend dataanalytics and database services. Nevertheless, the size and scale ofsuch networks pose challenges when planning to ensure that the networkshave sufficient capacity and resources to accommodate demand. Forexample, the size and scale of such networks may require a capacityplanner to forecast three months to two years ahead of expected demandto properly anticipate and install network capacity upgrades.

Making forecasts so far ahead is inherently challenging and fraught withuncertainty. For example, predicting how different services evolve overtime in production is difficult given the number of services that aresupported. Moreover, possible traffic profile changes include servicearchitecture changes, relabeling of Quality of Service (QoS) classes,traffic shifts for load-balancing, new service launches, and many otheroperational constraints that force service moves. To handle theseuncertainties, a traffic forecast may specify 90% of the trend to ensuresufficiently high confidence in the forecast.

Moreover, related capacity planning methodologies are often based on asingle traffic matrix, as discussed further below, which creates a riskof biasing the design toward that single traffic matrix. In case of afuture deviation from the single traffic matrix, additional capacity maybe required to minimize total network risk. Operationally, any observedtraffic shift requires investigation by network planners to identify ifthe change is catastrophic for the network or not. Overall, this relatedmethodology is reactive, ad hoc, and time-consuming. Accordingly, thisapplication discloses improved systems and methods for configuringnetworks, as discussed in more detail below.

As discussed further below, this application generally presents a designand operational experience of a hose-based backbone network planningsystem. The initial adoption of the hose model in network planning maybe driven by capacity and demand uncertainty that places pressure onbackbone expansion. Since the hose model abstracts a level of aggregatedtraffic volume per site, peak traffic flows at different time can bemultiplexed to save capacity and buffer traffic spikes. In oneembodiment, the design involves heuristic algorithms to selecthose-compliant traffic matrices and cross-layer optimization betweenoptical and Internet Protocol networks.

Features from any of the embodiments described herein may be used incombination with one another in accordance with the general principlesdescribed herein. These and other embodiments, features, and advantageswill be more fully understood upon reading the following detaileddescription in conjunction with the accompanying drawings and claims.

This application discloses technology that may increase the ability ofnetwork capacity upgrade planners to efficiently plan upgrades tonetwork capacity while additionally better handling uncertainty in theforecasting of future network traffic growth. The technology maygenerally achieve these benefits by applying a hose-based approach,rather than a pipe-based approach, to network planning, as discussed inmore detail below. In particular, the technology may fundamentally basethe plan for the network capacity upgrade on a constraint model thatbuilds in uncertainty and, therefore, generates a multitude of differenttraffic matrices to be satisfied by the network capacity upgrade plan.The resulting plan will thereby be much more resilient and effectivethan conventional plans that are based upon a single network trafficmatrix that was forecast to describe expected growth in network traffic.Such single network traffic matrices may be brittle and less effectiveat handling uncertainty.

Although the technology here generates network capacity upgrade plansthat are more resilient, the tolerance for uncertainty in networktraffic growth generates a much larger and potentially less tractableset of traffic matrices to be satisfied by the network capacity upgradeplan. Accordingly, the technology of this application may also providetechniques for filtering an initial set of traffic matrices to produce amore tractable set of traffic matrices, which may be considered asdominating traffic matrices, as discussed further below.

Global online service providers build wide-area backbone networks forconnecting thousands of Point-of-Presence (PoP) sites and hundreds ofData Centers (DCs) across continents. To keep up with the explosivetraffic growth, tremendous money and engineering effort are constantlybeing invested in expanding and upgrading the backbone network. Networkplanning is thus important to the backbone evolution, with the ultimategoal of devising capacity-efficient network build plans that areresilient to unforeseen demand uncertainties from service changes andtraffic dynamics.

The technology described herein may achieve this goal by innovativelyadopting the hose model in backbone planning. This application shares adesign and deployment of hose-based network planning. Unlike thestandard pipe model used in network planning that abstracts the networkas traffic demands between site pairs, the hose model specifies theaggregated ingress/egress traffic volume per site. This model enablesone to plan for “peak of sum” traffic with the hose model rather than“sum of peak” traffic with the pipe model. Because peak traffic flowsare unlikely to happen simultaneously, the multiplexing gain results insignificant capacity saving and leaves capacity headroom for trafficuncertainties in the future. One measurement shows that the hose modelhas 25% lower demand and can tolerate 50% more traffic uncertainty.

In the hose model, for each node, VPN provisioning and VM placement maphose capacity specification to a fixed network topology. Routing pathsto different source/destination nodes are opportunistically chosen toshare links, with the purpose of preserving the aggregated trafficdemand in the hose model to the largest degree. However, when pathssplit in the network, a hose capacity cap is allocated on each link toserve the worst-case traffic, under the business contract that VPN andVM tenants' capacity requirements should be strictly guaranteed. Thisapproach does not apply to network planning. First, the network topologymay be unknown and may benefit from planning. Second, the capacityduplication at path splits may follow “sum of peak” abstraction and maycompromise the multiplexing gain of the hose model. In particular, thismay underload a network, which may cost the price of building anover-provisioned backbone.

One contribution of this application is a solution to the problemdescribed above. The solution may leverage the insight that, regardlessof a network topology, capacity should be granted to node pairspoint-to-point, which naturally fits the pipe model. Therefore, theproblem may involve converting a hose input into pipe traffic matrices(TMs). However, the continuous hose space contains an infinite number ofTMs, thus it may be computationally intractable to plan for all possiblepipe TMs under the hose constraints. A challenge is to generate a smallsubset of TMs to represent the hose space. This application proposes aseries of heuristic algorithms to address this challenge. Theapplication therefore discloses a design for a sampling scheme togenerate candidate TMs uniformly in the hose space. From these TMs, onemay find critical TMs that stress current bottleneck links, which arepotential locations to deploy additional capacity. This applicationproposes a sweeping algorithm to quickly find bottleneck links in thenetwork. Critical TMs may be chosen through optimization, and theapplication may also describe “hose coverage” as a metric to quantifyhow representative these chosen TMs are. Another contribution of thisapplication is to share a production network planning process, withpractical considerations in a network setting. The application mayfurther describe the separation of short-term and long-term planning,the abstraction to simplify interaction between the optical and InternetProtocol, the resilience policy to protect against failures, and theoptical-to-Internet-Protocol cross-layer capacity optimization. Theapplication may also further evaluate the performance of hose-basednetwork planning system in production. The application may demonstratethat the hose model can save 17.4% capacity compared to the pipe modeland may drop up to 75% less traffic under unplanned failures. With theseadvantages of the hose model, the network can scale up on a per-nodebasis, as easily as storage and compute resources, in the future. Thisconcludes the more detailed introduction.

The following provides a more detailed discussion of the motivation forthe technology of this application. The hose model may abstract thenetwork by aggregating ingress and egress traffic demand per node,without the model necessarily knowing the specific source and sink ofeach traffic flow. This model contrasts with the pipe model thatpresents the pair-wise traffic demand between nodes. In related systems,network planning is based on the pipe model, because pairwise capacitydeployment plans may be realized in the end. However, production datamay demonstrate the benefits of the hose model in capacity planning.

The technology here may be based in part on an analysis of productiontraffic on a network backbone. To eliminate the time-of-day effect, theanalysis may consider an especially busy hour of day, when the totaltraffic in the backbone is the highest in the day. The traffic may besampled once per minute. Among sixty data points in the busy hour, thepipe model may consider the 90th percentile as the peak traffic demandas specified in a traffic forecast. For hose model demand, the analysismay aggregate the ingress and egress traffic per site for each datacollection point and obtain the “peak of sum,” or the 90th percentile ofthe sum traffic values. A real-world traffic forecast may use the movingaverage of the 90th percentile peak traffic across an N-day window tosmooth the traffic demand. The analysis may also add 3× the standarddeviation of the N-day data to the moving average as a buffer for suddentraffic spikes. To mimic this process, one may apply this method toobtain the hose and pipe “average peak” demand, as opposed to the “dailypeak” demand described previously. Specific advantages of the hose modelare as follows.

One advantage is traffic reduction. One difference between hose modeland the pipe model when deploying capacity is the difference between“peak of sum” and “sum of peak” traffic. If using the hose model, themultiplexing gain allows the technology here to plan for less capacity,as the instances of pipe traffic sharing the same source/sink areunlikely to reach the peak simultaneously. Because the “average peak”traffic is the realistic input to a traffic forecast, there may be agood reason to conclude that a considerable proportion of capacity canbe saved just by adopting the hose model for planning.

Another advantage is tolerance to traffic dynamics. The multiplexingeffect also means that hose model planning can cover more trafficvariations. In one example, if a plan is made for 0.55 unit of capacity,this will cover 90% of the cases in the hose model, but only 40% in thepipe model. The higher percentile in the hose model indicates that itcan tolerate more traffic uncertainty. Since the hose model isconstrained by the aggregated traffic amount, not by a particular TM, ithas more headroom to absorb unexpected traffic spikes.

Another advantage is increased stability in traffic demand. One analysismay measure variance of hose and pipe traffic. To make different trafficvolumes comparable, the analysis may use coefficient of variation as themetric, which is the standard deviation of the traffic volume divided bythe mean. In terms of the coefficient of variation for the “daily peak”traffic, the relative traffic dispersion in the hose model is muchsmaller than in the pipe model, with a shorter tail as well. As aresult, the hose model provides a more stable signal for planning andsimplifies traffic forecast. With these, the technology described hereinmay enable network scaling up as easily as storage and computeresources, where a node can have an accurate approximation of its futuregrowth, without a concern about the interaction with other nodes in thenetwork.

An additional advantage is increased adaptation to service evolvement.Services evolve over time in production. Possible causes include servicebehavior changes, relabeling of Quality of Service (QoS) classes,traffic shifts for load balancing, new service launches, and manyothers. In one example of a user database (UDB) service, due to resourceand operational constraints, the UDB servers storing data may sit in afew regions, and UDB-less regions may use a caching service to fetchdata from the UDB regions nearby. In terms of the amount of trafficflowing from UDB regions B and C to UDB-less region A, a significanttraffic change is a result of the service changing the primary UDBregion from B to C. Two specific incidents may create several Tbps oftraffic shifts, where a pipe model would fail. In contrast, because thetotal traffic amount stayed the same, the hose ingress traffic at regionA experienced little in terms of disruptions. The traffic aggregationnature of the hose model is naturally more resilient to service changes,making it a future-proof solution to network planning. This concludesthe detailed discussion of the motivation for the technology describedherein.

Detailed descriptions of a method for configuring networks are providedwith reference to FIG. 1. Detailed descriptions of a system forconfiguring networks are provided in connection with FIG. 2. FIGS. 3-10provide diagrams that help elaborate on the technology disclosed inFIGS. 1-2.

FIG. 1 is a flow diagram of an exemplary computer-implemented method 100for configuring networks. The steps shown in FIG. 1 may be performed byany suitable computer-executable code and/or computing system, includingthe system illustrated in FIG. 2. In one example, each of the stepsshown in FIG. 1 may represent an algorithm whose structure includesand/or is represented by multiple sub-steps, examples of which will beprovided in greater detail below.

The flow diagram of FIG. 1 may be performed at least in part by system200 of FIG. 2. As further shown in this figure, system 200 may include amemory 240 and a physical processor 230. Within memory 240, modules 202may be stored. Modules 202 may include a generation module 204, afiltering module 206, an obtaining module 208, and an allocation module210. Modules 202 may facilitate the performance of method 100 at leastin part by interacting with additional elements 220, which may include adata center constraint model 222 and a set of traffic matrices 224, asdiscussed in more detail below.

The following provides a more detailed description of the structure andcapabilities of system 200. In terms of a network model, a backbonenetwork may connect a number of DCs and PoPs together. The network modelmay include IP routers over a Dense Wavelength Division Multiplexingoptical network. The IP routers may be connected using IP links thatroute over multiple fiber segments. In one model, the network isrepresented as a two-layer graph: the IP network G=(V, E), where thevertices V are IP nodes and the edges E are IP links, and the opticalnetwork G′=(V′, E′), where the vertices V are optical nodes and theedges E′ are fiber segments.

For each IP link e∈E, FS(e) may be the set of fiber segments that erides over, which form a path on the optical topology. The IP link econsumes a portion of spectrum on each fiber segment I∈E′ over which eis realized. For example, a 100 Gbps IP link realized using QuadraturePhase Shift Keying modulation can consume 50 GHz of spectrum over allfiber segments in its path.

In terms of a failure model, the technology described herein may analyzea set of fiber failures in the backbone. Every IP link e∈E over thefailed fibers would be down. In order to provide desired reliability tothe service traffic, the technology described herein may pre-define aset of failures R referred to as planned failures. The productionnetwork should be planned with sufficient capacity such that all servicetraffic can be routed for each failure r∈R.

System 200 may solve the following problem statement. Network capacityis the maximum throughput (in Gbps, Tbps, or Pbps) the IP network, andindividual IP links, can carry. The problem of capacity planning is tocompute the desired network capacity to be built in the future. Buildinga network involves complex steps, including: (1) procure fibers fromthird-party providers, (2) build terrestrial and submarine fiber routes,(3) pull fibers on existing ducts, (4) install line system to light upthe fibers, (5) procure, deliver, install hardware (optical and IP) atsites, and/or (6) secure space and power at optical amplifiers andsites. All these activities have high lead time, taking months or evenyears to deliver. Thus, capacity planning is helpful to the futureevolution and profitability of the network.

In a network planning problem, the objective may be to dimension thenetwork for the forecast traffic under the planned failure set R byminimizing the total cost of the solution. The cost of the network maybe calculated based on a weighted function of equipment (fibers andother optical and IP hardware) procurement, deployment, and maintenanceto realize the network plan.

In terms of planning schemes, the technology described herein maycategorize capacity planning into two sub-problems: short-term planningand long-term planning. Short-term planning may output an exact IPtopology (i.e., the IP links and the capacity on each link), whereaslong-term planning may determine the fibers and hardware to procure. Acorresponding design decision may be based on the fact that networkbuilding can be an iterative process and long-term planning may serve asa reference point in many scenarios. For example, the fiber procurementplan may change at execution time according to the availability of fiberresources on the market. Short-term planning may be conducted afterfiber and hardware are secured and in place, because turning up capacitymay be requested at a short notice.

In terms of the planning process, network planning may begin with atraffic forecast. Unlike approaches that model total traffic per site,the technology described herein profile the growth of each individualservice. This method may be more accurate for a use case where new DCsare deployed yearly and services are migrated to different DCs whennecessary. As the traffic forecast is based on the hose model, theaggregated hose constraints from individual services are fed to thecapacity planner as the overall hose demand in the backbone.

One aspect of hose-based network planning is converting the hoseconstraints into pipe TMs. Thus, the technology described herein maynarrow down an infinite number of possible pipe TMs to a small set ofrepresentative ones. Short-term and long-term planning may then be basedon the reference TMs with different optimization formulations,considering various failure scenarios under the resilience policy.

The output of planning may be a Plan Of Record (POR), in the format ofcapacity between site pairs. The POR from short-term planning may beprovided to a capacity engineering component for capacity turn-up, andthe POR from long-term planning may be provided to a fiber sourcingcomponent for fiber procurement and to an optical design and IP designcomponent for deployment of fibers and optical line systems.

Returning to FIG. 1, at step 110 one or more of the systems describedherein may generate a data center constraint model by placing aconstraint on a total amount of ingress or egress traffic a serviceexpects from each respective data center of multiple data centers. Forexample, FIG. 7 shows an illustrative example of a network topology witha multitude of different data centers as nodes. Examples of such datacenters may include LAX and SNE, as further shown in the figure.Accordingly, at step 110, generation module 204 may generate data centerconstraint model 222 by placing a constraint on a total amount ofingress or egress traffic a service expects from each respective datacenter of multiple data centers, such as data centers LAX or SNE shownin FIG. 7. The term “data center constraint model” may refer to a modelthat places a constraint on data center ingress and/or egress networktraffic, as discussed in more detail below in connection with FIGS. 4-5.

Generation module 204 may perform step 110 in a variety of ways. FIGS.3-4 help to illustrate the general methodology by which generationmodule 204 may perform step 110. In particular, FIG. 3 shows a relatedmethodology that produces a single traffic matrix 302 as one input to acapacity planner 308. In addition to traffic matrix 302, a design policy304 and a backbone topology 306 may also be input to capacity planner308. Design policy 304 may specify requirements or desired features ofthe upgraded network. For example, design policy 304 may specify anamount of network traffic that should be handled by the capacity of oneor more data centers. Design policy 304 may also specify one or morepotential failure conditions that the upgraded network shouldnevertheless be able to handle or satisfy with sufficient capacity.Backbone topology 306 may correspond to a generally longitudinal andlatitudinal network topology with data centers as nodes and backbonelines as links within a corresponding graph. FIG. 7 shows anillustrative example of such a backbone topology. Lastly, capacityplanner 308 may generate a network design 310, which may further specifya capacity plan, a fiber turn-up level or architecture, and/or a levelof space and power per data center or site.

It may be helpful to describe the intended meaning of traffic matrix302. The rows and columns of this traffic matrix generally correspond todata centers, such as the data centers shown in the network topology ofFIG. 7. Although FIGS. 3-4 show N×M traffic matrices for illustrativepurposes, traffic matrix 302 may also form an N×N matrix, with the rowsand columns functioning as indices for the same set of data centers. Forexample, the top-left square of traffic matrix 302 may specify orindicate expected or forecasted network traffic between a first datacenter (e.g., as a source) and the same first data center (e.g., as asink). As such, the value at this particular square may be blank ornull, since the technology here is generally less concerned withintra-data-center network traffic as distinct from inter-data-centernetwork traffic. The diagonal from the top-left square to thebottom-right square may similarly be blank or null in a parallel manner.

Nevertheless, the square to the right of the top-left square of trafficmatrix 302 may indicate expected network traffic between the first datacenter (corresponding to the first row), as a source, and a second datacenter (corresponding to the second column), as a sink. And so on.Overall, traffic matrix 302 may therefore indicate expected orforecasted network traffic between each pair of data centers formed bythe permutations of a set of data centers, such as those shown in FIG.7, for example.

The related methodology of FIG. 3 may suffer from one or more problemsor inefficiencies. In particular, the use of a single traffic matrixmakes the corresponding forecast relatively brittle. In other words, theuse of a single traffic matrix makes it relatively more difficult tohandle uncertainty in the forecast of network traffic growth. The use ofthe single traffic matrix effectively forces the capacity planner tomake a specific guess about how network traffic will grow in the future.The use of the single traffic matrix also creates computationalchallenges due to the requirement of generating N×N (minus the diagonal)forecast calculations. It would, therefore, be helpful to identify amethodology that addresses uncertainty in forecasting network trafficgrowth while also better handling or reducing the computationalchallenges. The methodology of FIG. 3 may be referred to as a“pipe-based” approach that calculates a volume of network traffic foreach pair within the set of data centers, where each pair forms a“pipe.”

FIG. 4 shows a diagram for an improved methodology for forecastingnetwork traffic growth that addresses the problems outlined above, andwhich may be performed or facilitated by one or more of modules 202according to method 100, as discussed above. As further shown in thisfigure, a traffic matrix 402 may correspond to traffic matrix 302 ofFIG. 3. The large “X” in FIG. 4 indicates that the methodology of FIG. 4will omit this single traffic matrix and, instead, use a vector 406 anda vector 404. Vector 406 may indicate the summation of the values alongcorresponding columns of traffic matrix 402. Similarly, vector 404 mayindicate the summation of the values along corresponding rows of trafficmatrix 402. The improved methodology of FIG. 4 may generate a trafficforecast 410.

The data center constraint model may place a constraint on a totalamount of ingress and/or egress network traffic from a data center, suchas the first data center. FIG. 5 shows an illustrative diagram of a hoseconstraint 502 that may be placed on a set 506 of four data centers. Inthis specific example, node 1 may generate egress network traffic,whereas node 2, node 3, and node 4 do not generate any egress networktraffic. As an illustrative example, the data center constraint modelmay place a constraint of “10” on the total amount of egress networktraffic from node 1. The constraint may be defined in any suitablemetric or unit for measuring a volume, flow, or bandwidth of networktraffic. In such examples, different constraint values may be placed ondifferent data centers or the same value may be used. Similarly,different constraint values may be placed on ingress versus egressnetwork traffic, or the same value may be used. Alternatively, the datacenter constraint model may place a constraint of “10” on the totalamount of ingress or egress network traffic from all of the nodes, whichin this scenario would result in the same output, because all of theegress network traffic in this example comes from node 1. FIG. 5 alsofurther illustrates how placing hose constraint 502 on a data centerwithin set 506 of the four data centers indicates a corresponding convexpolytope 504.

Convex polytope 504 may form a surface with a mathematically continuousset of points along this surface. Each point within the set of pointsmay correspond to a single traffic matrix, analogous to traffic matrix402, which satisfies hose constraint 502. In particular, the pointsalong the surface may maximally satisfy the hose constraint, whereaspoints within convex polytope 504 underneath the surface may satisfy thehose constraint without achieving the maximum value of 10. Themathematically continuous nature of the set of points on the surface maycreate a computational challenge in terms of calculating a networkcapacity upgrade plan based on an analysis of the traffic matrices thatcorrespond to the set of points. In particular, such a continuous setwill generally be infinite in nature. For this reason, it may be helpfulto filter the set of points to create a finite and more tractable set ofpoints for analysis. Such a finite and tractable set of points mayinclude, for example, the three extremities or maximal points alongconvex polytope 504, which effectively form corners of the convexpolytope. Further techniques and refinements for selecting a moretractable set of representative traffic matrices will be discussedfurther in connection with FIGS. 6-10. The methodology of FIGS. 4-5 thatplaces a constraint on a total amount of ingress and/or egress networktraffic at a data center may be referred to as the “hose-based”approach, as distinct from the pipe-based approach of FIG. 3.

Leveraging the hose-based approach may provide a number of substantialbenefits and improvements over the pipe-based approach. In particular,usage of the hose-based approach reduces the level of the trafficforecast while nevertheless preserving a level of confidence that theresulting network capacity plan will not become overburdened. In otherwords, usage of the hose-based approach reduces the need to overshoot orovercorrect the network capacity plan to accommodate uncertainty in theforecast. This may further result in a statistical multiplexing gainwhere the hose-based approach is calculated as the peak of sums and thepipe-based approach is calculated as the sum of peaks, in terms ofnetwork traffic. Usage of the hose-based approach may also reduce adaily peak value by 10% to 15% and may reduce an average peak level by20% to 25%. The hose-based approach to forecasting is also more stableand introduces a lower level of variance. Similarly, usage of thehose-based approach creates benefits in terms of operational simplicity.Planning complexity becomes simpler due to the transition from requiringN calculations (hose) rather than N squared (pipe). Usage of thehose-based approach also simplifies cross-region demand shifts.

Returning to FIG. 1, at step 120 one or more of the systems describedherein may filter a set of traffic matrices that indicate points in thedata center constraint model by comparing the set of traffic matricesagainst cut sets of a network topology that indicate network failures tocreate a tractable set of dominating traffic matrices. As discussedabove, FIG. 3 shows example traffic matrix 302, and FIG. 7 showsexamples of network topology cut sets. Accordingly, at step 120,filtering module 206 may filter a set of traffic matrices, such astraffic matrix 302, that indicate or sample points in data centerconstraint model 222 by comparing the set of traffic matrices againstthe cut sets of the network topology that is shown in FIG. 7.

Filtering module 206 may perform step 120 in a variety of ways. From ahigh-level perspective, FIG. 6 shows a workflow 601 that outlines ageneral process for filtering reference traffic matrices from anoriginal set of traffic matrices. At step 602, the data centerconstraint model referenced at step 110 may be input to a referencetraffic matrix generator 604. Reference traffic matrix generator 604 maygenerate a filtered set 606 of traffic matrices which may serve asreference or representative traffic matrices that, at step 608, may beinput to a cost optimizer integer linear programming formulation 610.Reference traffic matrix generator 604 may operate in accordance withthe examples of FIGS. 8-10. Returning to FIG. 6, at step 612, costoptimizer integer linear programming formulation 610 may output anetwork capacity upgrade plan for a cross-layer network upgradearchitecture, as discussed further below.

As first introduced earlier above, FIG. 7 shows an illustrative exampleof a network backbone topology with nodes as data centers and non-dashedlines as backbone lines for a corresponding network. The example of thisfigure corresponds to North America. Nevertheless, method 100 may beperformed in connection with any other global, regional, or localinter-data-center network for an area, state, territory, country,continent, or the entire world. FIG. 7 also illustrates how four graphcuts, including a graph cut 702, a graph cuts 704, a graph cut 706, anda graph cut 708, may form cut sets (i.e., mathematical divisions of theset of nodes into two separate sets of nodes) of the network topology.In particular, FIG. 7 illustrates how these graph cuts form eitherstraight lines (e.g., graph cut 702) or substantially straight lines(e.g., graph cut 708). Each of these graph cuts may indicate orrepresent a potential network failure, as specified by design policy 304of FIG. 3, which may require or request that network design 310 providesufficient capacity to handle network traffic across the correspondinggraph cut despite such a network failure. Illustrative examples of suchnetwork failures may include one or more of singular fiber cuts, dualsubmarine link failures, and/or repeated failures.

FIG. 8 shows a diagram of an illustrative sweeping methodology forgenerating straight or substantially straight graph cuts andcorresponding cut sets. In particular, the methodology may sweep in adirection 802 around the set of nodes to produce a set of four cut sets,including a cut set 804, a cut set 806, and a cut set 808, and a cut set810, which may be formed by a graph cut 805, a graph cut 807, a graphcut 809, and a graph cut 811. The methodology may draw a reference cutline at each sweeping step, which may split the nodes into threemutually exclusive categories. These three categories are identified byreference key 812 in FIG. 8. The first category may include edge nodes,which indicate a definitional value smaller than a threshold alpha,where the definitional value is defined as the distance of the edge nodeto the original (e.g., straight) cut line divided by the distance of thefarthest node in the network to the cut line. The second category mayinclude above nodes, which are above the cut line but which are notincluded within the edge nodes group. The third category may includebelow nodes, which are below the cut line but are not included in theedge nodes group. According to the methodology of this figure, thesweeping procedure may center around k points per rectangle side andmove in steps of an angle theta, such as an angle 820 and an angle 822shown in FIG. 8. The reference cut (e.g., the straight line betweenangle 820 and angle 822) in the example of the sweeping methodology maycreate two edge nodes, and the permutations of these two edge nodes mayform four graph cuts. Accordingly, network cuts may be generated for allpossible bipartite splits of the edge nodes combined, respectively, withthe above and below nodes.

FIG. 10 shows another set of diagrams that further illustrate techniquesfor generating cut sets. Diagram 1002 shows an example of a single cutset that may be generated by a vertical dashed line, thereby dividingthe set including all eight nodes into two separate sets that eachinclude four nodes, as shown in the figure. Diagram 1004 and diagram1006 show simple cuts and a complicated cut, respectively, which may becontrasted with each other. Simple cuts may generally form straightlines, as shown in FIG. 10. Nevertheless, simple cuts may also begenerated by lines that are substantially straight, without beingperfectly straight. For example, simple cuts may include relativelysmall deviations from the straight line according to the sweepingprocedure of FIG. 8 that generates all possible bipartite splits of theedge nodes combined respectively with the above nodes and the belownodes. In some scenarios, simple cuts may be preferable to complicatedcuts. For example, the long jagged dashed line in diagram 1006 maycorrespond to a complicated cut.

At step 130, filtering module 206 may filter the set of traffic matricesby comparing the traffic matrices to the cut sets, such as the cut setsshown in FIG. 8. In particular, filtering module 206 may search fordominating traffic matrices. The term “dominating traffic matrix” for anetwork cut may refer to a traffic matrix in the set of traffic matrices(e.g., sampled or random selections of traffic matrices within thecontext polytope indicated by the data center constraint model) that hasthe highest traffic amount across the graph cut. This corresponds to the“strict version” of the dominating traffic matrix selection process.Additionally, or alternatively, a dominating traffic matrix for anetwork cut may refer to a traffic matrix from the set of trafficmatrices having a traffic amount across the cut that is no smaller than1 minus epsilon of the maximum among all the sampled traffic matrices,where epsilon is a small value in [0, 1]. This corresponds to the “slackversion” of the dominating traffic matrix selection process. In somescenarios, the “slack version” of the dominating traffic matrixselection process may be preferred. Generally speaking, dominatingtraffic matrices may correspond to demand scenarios that are predictedto drive requirements for physical network resources.

Returning to the figures, FIG. 9 shows an example integer linearprogramming methodology that may select or discover the minimum numberof traffic matrices with at least one dominating traffic matrix pernetwork cut. The methodology of FIG. 9 may apply an integer linearprogramming formulation of the minimum set cover problem to the trafficmatrices and cut sets of FIGS. 4-8. In this case, the universe to coveris the set of network cuts, and the covering sets are, for eachslackly-dominating traffic matrix M (e.g., dominating according to theslack version), the set of network cuts for which that dominatingtraffic matrix is slackly-dominating. The variable AM may correspond toa binary assignment variable, which may be 1 if a candidate dominatingtraffic matrix M is selected. The variable AM may be 0 otherwise. T maybe the set of traffic matrices that are being considered. In otherwords, T may refer to the set of all sampled traffic matrices that areslackly dominating for some network cut. The top right quadrant of thetable in FIG. 9 may specify the number of traffic matrices that arechosen. The methodology of FIG. 9 seeks to minimize this number. D(C)may refer to the set of slackly dominating traffic matrices for networkcut C. Lastly, the bottom right quadrant of this table refers to thenumber of chosen traffic matrices that slackly dominate network cut C.For every network cut C, the goal is for this value to be at least 1. Inother words, the goal of the methodology here is to ensure that the setof chosen traffic matrices includes at least one traffic matrix thatslackly dominates network cut C according to the slack version of thedominating traffic matrix selection process. Applying the integer linearprogramming formulation of FIG. 9 may generate the tractable set ofdominating traffic matrices, for example. To be clear, the term“dominating traffic matrix” may refer to either strictly orslackly-dominating traffic matrices.

As part of performing step 120, filtering module 206 may generate andsample traffic matrices. The following provides a more detaileddiscussion of generating such traffic matrices. The generationprocedures may include three stages: traffic matrix sampling, bottlenecklinks sweeping, and dominating traffic matrices selecting.

In a first stage of traffic matrix sample, a traffic matrix (TM) for anN-node network topology may correspond to an N×N matrix, where eachcoefficient m_(i, j) represents the traffic amount of a flow (typicallyin Gbps in practice) from the source node i to the destination node j.The flow traffic amount may be non-negative, and in some examples a nodedoes not generate traffic to itself. Hence, the coefficients may be inR₊ and all diagonal coefficients may be zero.

A valid TM may satisfy the following hose constraints, where u→_(s) andu→′d are the 1×N and N×1 all-ones column and row vectors, and thecorresponding demand vectors h→_(s) and h→′_(d) bound the total egressand ingress traffic amount at the source and destination nodes. Theseconstraints form a convex polytope in the N²−N dimension space, whereeach non-zero coefficient in the TM is a variable. FIG. 5 illustrates ahighly simplified 3D example with variables m1,2, m1,3, and m1,4. Eachvalid TM is a point in the polytope space, and there are an infinitenumber of valid TMs in this continuous space.

Hose Constraints:{right arrow over (u)} _(s) ·M≤{right arrow over (h)} _(s)M·{right arrow over (u)}′ _(d) ≤{right arrow over (h)}′ _(d)

To generate TMs that satisfy the hose constraints, a first step is tosample the polytope space uniformly. A corresponding algorithm mayinclude two phases for generating one sample TM. The algorithm mayrandomly create a valid TM in the polytope space in a first phase andstretch it to the polytope surfaces in a second phase, based on theintuition that TMs on the surfaces have higher traffic demands andtranslate to higher capacity requests for network planning.

In a first phase, filtering module 206 may initialize the TM to a zeromatrix and assign traffic to the TM entries one by one in a randomorder. For every entry m_(i,j), the maximal allowed traffic amount isthe lesser of the two hose constraints for source i and destination j.Filtering module 206 may give this value a uniformly random scalingfactor between 0 and 1 and assign the product to the entry in the TM.For bookkeeping, the consumed traffic amount may be deducted from thehose constraints.

In the second phase, filtering module 206 may add residual traffic tothe TM to exhaust as many hose constraints as possible. Similar to thefirst phase, filtering module 206 iterates through the entries in arandom order and adds the maximal allowed traffic amount to each entry.Because filtering module 206 iterates through all the entries and alwaysconsumes the maximal traffic, the second phase is guaranteed to exhaustthe most hose constraints from the result of the first phase. It alsoguarantees that the egress and ingress hose constraints cannot besimultaneously unsatisfied (remaining constraints must be all egress orall ingress), because if that were the case, the algorithm would simplyincrease the associated source-destination flows until either ingress oregress constraints are exhausted.

The sampling algorithm outlined above is highly effective regardless ofthe simplicity of the network. In some examples, over 97% of the hosepolytope space is covered with 105 sample TMs. The effectiveness comesfrom the high level of randomness: (1) filtering module 206 appliesdifferent permutations of the TM entries in each run to distribute thehose traffic budget in different ways, and (2) filtering module 206 usesa scaling factor to adjust the assignable traffic randomly according tothe uniform distribution.

Filtering module 206 may also generate traffic matrices in part bysweeping through bottleneck links. It may be computationally infeasibleto consider the enormous number of TM samples. Fortunately, differentTMs have different levels of importance for network planning. As thegoal of network planning is to add capacity to “bottleneck links” in thenetwork, TMs with high traffic demands over the bottleneck links play adominating role. These may be called TMs Dominating Traffic Matrices(DTMs), and filtering module 206 may aim to find a small number of DTMssuch that designing the network explicitly for them has a highprobability of satisfying the remaining TMs as well. From theperspective of graph theory, bottleneck links are captured by thenetwork cuts that partition the nodes into two disjoint subsets.However, the number of network cuts is exponential to the network size.A production backbone network has tens to a few hundred nodes, thusenumerating all the cuts will be intractable. This application disclosesa sweeping algorithm to quickly sample the network cuts, and thesweeping process is illustrated in FIG. 8.

The sweeping algorithm has a hyperparameter edge threshold a chosen inthe [0, 1] interval. The network nodes are represented by their latitudeand longitude coordinates. We draw the smallest rectangle inscribing allthe nodes and radar-sweep the graph centering at points on the rectanglesides.

There are k equal-interval points per side and the sweeping is performedat discrete orientation angles of interval θ. As one example, k=1000 andθ=1°. The algorithm draws a reference cut line at each sweeping step,which splits the nodes into the following three mutually exclusivecategories. Edge nodes, whose distance to the cut line divided by thedistance of the farthest node in the network to the cut line is smallerthan α. Above nodes, which are above the cut line but are not in theedge nodes group. Below nodes, which are below the cut line but are notin the edge nodes group.

Network cuts are all possible bipartite splits of the edge nodescombined respectively with the above and below nodes. In this algorithm,parameters k and θ define the sampling granularity, and the edgethreshold a regulates the number of cuts considered per sampling step.As α increases, filtering module 206 is able to generate an increasinglylarge number of network cuts. In particular, setting α to 1 guaranteesthe enumeration of all partitions of the network.

Filtering module 206 may also generate traffic matrices in part byselecting dominating traffic matrices. The formal definition of DTM withrespect to network cuts is as below. With the TMs sampled and networkcuts generated according to the description above, filtering module 206seeks to find the TM that produces the most traffic for every networkcut.

The phrase “domination traffic matrix (strict version)” may, in someexamples, refer to the traffic matrix in all the sampled trafficmatrices that has the highest traffic amount across the cut. Thisdefinition yields as many DTMs as there are network cuts. To furtherreduce the number of TMs involved in planning computation, filteringmodule 206 may leverage the minimum set cover problem: if one slacks theDTM definition from the most traffic-heavy TM per network cut to a setof relatively traffic-heavy TMs within a bound to the maximum, the setsof DTMs for different cuts are likely to overlap and the cuts may berepresented by a smaller number of overlapping DTMs. Filtering module206 may therefore use a flow slack E and define the slack version of DTMas below.

Similarly, the phrase “domination traffic matrix (slack version)” mayrefer to a traffic matrix from the sampled traffic matrices whosetraffic amount across the cut is no smaller than 1−∈ of the maximumamong all the sampled traffic matrices, where E is a small value in [0,1].

Filtering module 206 may formulate the minimum set cover problem suchthat the universe is the ensemble of network cuts C. For every cut c∈C,filtering module 206 obtains the set of DTMs D(c) under the given flowslack E according to the definition above. Combining them, filteringmodule 206 obtains a collection T={M} of all the candidate DTMs, whereeach DTM belongs to a subset of cuts in C. For example, a DTM may begenerated by multiple cuts {c_(i), c_(j), c_(k)} at the same time.Filtering module 206 may have the goal of finding the minimal number ofDTMs to cover all the cuts in C.

Filtering module 206 may solve this minimum set cover problem by IntegerLinear Programming (ILP). Filtering module 206 may define a binaryassignment variable A_(M), which is set to 1 if a candidate DTM isselected in the end and set to 0 otherwise. The assignment variables mayguarantee each network cut is represented by at least one of itscandidates DTMs, and filtering module 206 may minimize the number ofselected DTMs minimizing the sum of the assignment variables.

$\min{\sum\limits_{M \in T}A_{M}}$${{s.t.\mspace{14mu}{\sum\limits_{M \in {D{(c)}}}A_{M}}} \geq 1},{\forall_{C}{\in C}}$A_(M) ∈ {0, 1}, ∀M ∈ T

Filtering module 206 may achieve a low DTM count with a commercial ILPsolver, such as FICO Xpress. In one example, a flow slack ofapproximately 1% can reduce the number of DTMS by over 75%, whichcorresponds to a substantial gain in terms of the computation needed forcapacity planning. A further increase in the flow slack results in evenmore impressive results, though at the cost of hose coverage.

When generating traffic matrices, filtering module 206 may also evaluatean extent to which the entire hose space is covered. Filtering module206 may define a metric to evaluate the degree to which our generatedreference TMs cover the entire Hose space. In particular, since we use atwo-stage process, where we sample the Hose space using a large numberof TMs and further down-sample them to reach a smaller number of DTMs,it is desirable to measure the Hose coverage for each stage of theprocess.

As discussed above, the hose model is represented by a convex polytope Pin a high-dimensional vector space, which is a natural way to measurethe coverage of a set of samples S would be by volume, namely the volumeof the convex hull containing all the samples divided by the volume ofthe hose space as follows.

${{Coverage}\left( {S,P} \right)} = \frac{{Volume}\left( {{ConvexHull}(S)} \right)}{{Volume}(P)}$

When applied to practical instances of network planning, however, thismetric may become intractable. The complexity of computing a convex hullfor V points in a L-dimensional space is approximately O(V^(L/2)). Inour case, V=N²−N where N is the node count in the network, which can bea few hundred, and the sample size V=|S| can be 10⁵.

Instead, filtering module 206 may establish the planar coverage of thehose space P by a set of samples S on a plane b as follows, where π(S,b) marks the projection of the samples in S on the plane b, and π(P, b)is the projection of the hose polytope P on b.

${{PlanarCoverage}\left( {S,P,b} \right)} = \frac{{Area}\left( {{ConvexHull}\left( {\prod\left( {S,b} \right)} \right)} \right)}{{Area}\left( {\prod\left( {P,b} \right)} \right)}$

For a collection of planes B, we define the coverage of the Hose space Pby a set of samples S to be the mean planar coverage of P by S acrossall the planes in B.

$\overset{\_}{{Coverage}\left( {S,P} \right)} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}{{PlanarCoverage}\left( {S,P,b_{i}} \right)}}}$

The choice of these planes is helpful for picturing the high-dimensionalhose space truthfully. These planes should characterize all thevariables in the hose constraints and the variables should contributeequally to shaping the planes. Filtering module 206 may construct planeswith the pair-wise combinations of the variables with thehose-constraints. Each variable may be an off-diagonal coefficient of avalid TM or M, or a source-destination pair in the network.

Returning to FIG. 1, at step 130 one or more of the systems describedherein may obtain physical network resources to implement a cross-layernetwork upgrade architecture that satisfies the tractable set ofdominating traffic matrices. For example, at step 130, obtaining module208 may facilitate the obtaining of physical network resources toimplement an upgrade architecture for a cross-layer network, such as thenetwork shown in FIG. 7, where the architecture satisfies the tractableset of dominating traffic matrices that was created at step 120.Generally speaking, the cross-layer network upgrade architecture maysize a link across each graph cut of the cut sets, as discussed above.The term “cross-layer network upgrade architecture” may refer to anyplan, design, or specification for upgrading network capacity accordingto steps 130 and 140.

Obtaining module 208 may perform step 130 in a variety of ways. The term“obtaining” may broadly refer to any action, command, or instruction toobtain, purchase, reserve, control, or request corresponding physicalnetwork resources. In alternative examples, obtaining module 208 mayperform an action that simply facilitates the retrieval of suchresources without actually and entirely retrieving them. In someexamples, step 130 may be performed entirely by obtaining module 208,and in other examples, step 130 may be performed by a humanadministrator, team, automated or semi-automated device, and/or bycoordination between one of these and obtaining module 208.

Generally speaking, the physical network resources may include fiberand/or equipment. Moreover, obtaining module 208 may ensure that thecross-layer network upgrade architecture satisfies the tractable set ofdominating traffic matrices. This may further guarantee, according to apredefined probability threshold, that the cross-layer network upgradearchitecture will also satisfy a remainder of the set of trafficmatrices. In other words, satisfying the dominating traffic matrices mayhelp ensure that the resulting network capacity plan also satisfies theremainder of the set of traffic matrices from which the dominatingtraffic matrices were selected by comparison with the cut sets.

In some examples, the cross-layer network upgrade architecture for whichobtaining module 208 is obtaining resources may be generated accordingto a cost optimization model. The cost optimization model may optionallygenerate the cross-layer network upgrade architecture through integerlinear programming. The cost optimization model may optionally accountfor one or more of fiber count, space and power, spectrum consumption onfiber, and/or maximum capacity constraints on leased waves.Additionally, or alternatively, the cost optimization model may accountfor cost of at least one or more of fiber procurement, optical orInternet protocol hardware, operational cost, space and power, and/orprovisioning cost. Furthermore, the cost optimization model may satisfya flow conservation constraint for all planned network failures and/orjointly optimize for all dominating traffic matrices.

Returning to FIG. 1, at step 140 one or more of the systems describedherein may allocate the physical network resources across the multipledata centers according to the cross-layer network upgrade architecturesuch that a capacity level of the multiple data centers is increasedwhile satisfying the data center constraint model. For example, at step140, allocation module 210 may allocate the physical network resourcesacross the multiple data centers shown in FIG. 7 according to thecross-layer network upgrade architecture referenced at step 130 suchthat a corresponding capacity level is increased while satisfying thedata center constraint model generated at step 110.

Allocation module 210 may perform step 140 in a variety of ways. Thephrase “allocate” may broadly refers to any action, command, orinstruction that assigns or moves resources to implement the cross-layernetwork upgrade architecture. As with step 130, step 140 may beperformed entirely by allocation module 210, and in other examples, step140 may be performed by a human administrator, team, automated orsemi-automated device, and/or by coordination between one of these andallocation module 210. Thus, allocation module 210 may simply instruct,coordinate, or organize the implementation of infrastructure accordingto the cross-layer network upgrade architecture such that thecorresponding network capacity plan is implemented, and the data centerconstraint model is satisfied (e.g., because satisfying the tractableset of traffic matrices helps to ensure that the corresponding datacenter constraint model is also satisfied). Moreover, the term“satisfies” may refer to the cross-layer network upgrade architectureand/or network capacity plan providing sufficient capacity and/ornetwork resources to accommodate or achieve traffic flows indicated bythe tractable set of traffic matrices and despite one or more failuresthat may be specified by the design policy and corresponding networktopology graph cut sets.

The above description provides an overview of method 100 shown inFIG. 1. Additionally, or alternatively, the following discussionprovides a supplemental overview of concrete embodiments of thedisclosed technology.

Modern technology companies often have a production backbone networkthat connects data centers and delivers the social network's content tothe users. The network supports a vast number of different services,which are placed across a multitude of data centers. The trafficpatterns shift over time from one data center to another due to changingplacement requirements. As a result, there can be exponential and highlyvariable traffic demand growth. To meet service bandwidth expectations,it is desirable to have an accurate long-term demand forecast. However,due to the nature of the services, the fluidity of the workloads, andanticipation of future needs, identifying a precise forecast is nearlyimpossible. Thus, it is helpful for long-term network plans to accountfor traffic demand uncertainty.

This discussion covers the design methodology changes that can be madeto absorb traffic uncertainty in network capacity upgrade planning. Aclassical approach to network planning is to size the topology toaccommodate a given traffic matrix under a set of failures that aredefined using a failure protection policy. In this approach: (i) thetraffic matrix is the volume of the traffic that is forecast between anytwo data centers (pairwise demands), (ii) the failure protection policyis a set of commonly observed failures in the network such as singularfiber cuts, dual submarine link failures, or a set of failures that havebeen encountered multiple times, and (iii) a cost optimization model isused to calculate the network turn-up plan. Essentially, this refers toan integer linear programming (ILP) formulation that performs max-flowover each individual failure and ensures the capacity to admit thetraffic matrix for each failure.

The following describes one or more problems to be solved. First, interms of a lack of long-term fidelity, backbone network turn-up requireslonger lead times, typically on the order of months. Even worse, thiscan include multiple years in the case of deciding on terrestrial fiberand submarine investments. Given past growth and the dynamic nature ofsuch services, it can be challenging to forecast service behavior for atimeframe over six months. One related approach was to handle trafficuncertainties by dimensioning the network for worst-case assumptions andsizing for a higher percentile, say P95. Nevertheless, asking everyservice owner to provide a traffic estimate per data center pair ishardly manageable. With the classical approach, a service owner isrequested to give an explicit demand spec. That is daunting because notonly are there visible changes in current service behavior, but it isalso unknown what new services will be introduced and will consume thenetwork in a one-year or above timeframe. The problem of exactforecasting traffic is even more difficult in the long term because theupcoming data centers are not even in production when the forecast isrequested.

Second, in terms of extracting the network as a resource, a servicetypically requires compute, storage, and network resources. Data centersget their shares of computing and storage resources allocated to them.The service owners then can analyze the short-term and long-termrequirements for these as a consumable entity per data center. However,this is not true for the network because the network is a sharedresource. It is desirable to create a planning method that can abstractthe network's complexity and present it to services like any otherconsumable entity per data center.

Third, in terms of operational churn, tracking every service's trafficsurge, identifying its root-cause, and tracking its potential impact isbecoming increasingly difficult. Most of these surges are harmlessbecause not all services surge simultaneously. Nonetheless, this stillcreates operational overhead for tracking many false alarms.

A solution to one or more of the above problems may be described asnetwork hose-based planning. Instead of forecasting traffic for each(source, destination) pair, a traffic-forecast is calculated fortotal-egress and total-ingress traffic per data center, i.e., the“network-hose.” Instead of asking the question “how much traffic aservice would generate from X to Y,” one may ask “how much ingress andegress traffic a service expects from X.” Thus, the O(N{circumflex over( )}2) data points per service may be replaced with O(N). When planningfor aggregated traffic, one may also naturally factor in statisticalmultiplexing into the forecast.

FIG. 4 reflects the change in input for the planning problem. Instead ofa classical traffic matrix, potentially only hose-based trafficforecasts may be used as the basis to generate a network plan thatsupports the forecasts under all failures defined by the failure policy.

In terms of solving the planning challenge, while the “network-hose”model captures end-to-end demand uncertainty concisely, it may pose adifferent challenge: dealing with the infinitely many demand setsrealizing the hose constraints. In other words, if one takes the convexpolytope of all the demand sets that satisfy the hose constraint, it hasa continuous space inside the polytope to deal with. Typically, thiswould be useful for an optimization problem as one may leverage linearprogramming techniques to solve this effectively. However, this model'skey difference is that each point inside the convex polytope is a singletraffic matrix. The long-term network build plan has to satisfy all suchdemand sets if one fulfills the hose constraint. This creates anenormous computational challenge as designing a cross-layer globalproduction network is already an intensive optimization problem for asingle demand set. The above reasons drive the need or desire tointelligently identify a few demand sets from this convex polytope thatcan serve as reference demand sets for the network design problem. Infinding these reference demand sets, one may be interested in a fewfundamental properties that should be satisfied: (i) these are thedemand sets that are likely to drive the need for additional resourceson the production network (e.g., fiber and equipment), such that if onedesigns the network explicitly for this subset, then one would furtherseek to guarantee with high probability that the remaining demand setsare covered, and (ii) the number of reference demand sets should be assmall as possible to reduce the cross-layer network design problem'sstate-space.

To identify these reference demand sets, one may exploit the cuts in thetopology and location (latitude, longitude) of the data centers to gaininsights into the maximum flow that can cross a network cut. FIG. 10shows a network cut in an example topology. This network cut partitionsthe topology into two sets for nodes (1, 2, 3, 4) and nodes (5, 6, 7,8). To size the link on this network cut, one only needs one trafficmatrix that generates maximum traffic over the graph cut. All othertraffic matrices with lower or equal traffic over this cut get admittedwith no additional bandwidth requirement over the graph-cut. Note that,in a topology with N nodes, one can create 2{circumflex over ( )}Nnetwork cuts and have one traffic matrix per cut. However, thegeographical nature of these cuts is essential, given the planar natureof the network topology. It turns out that simple cuts (typically astraight-line cut) are potentially more helpful to dimension thetopology than more complicated cuts. As further shown in FIG. 10, atraffic matrix for each of the simple cuts is more meaningful than atraffic matrix for the “complicated” cuts due to the fact that a“complicated” cut is already taken into account by a set of simple cuts.By focusing on the simple cuts, one may reduce the number of referencedemand sets to the smallest traffic matrix set. One may then solve thesetraffic matrices using a cost optimization model and produce a networkplan supporting all possible traffic matrices. Based on simulations, onemay observe that, given the nature of the network topology, additionalcapacity required for the hose-based traffic matrix is not significantbut provides powerful simplicity in network planning and operationalworkflows. One may also potentially adopt the hose-based trafficcharacterization for dimensioning the production backbone networkbecause it will enable one to simplify network planning and operations,and will help the services to interact with the network like any otherconsumable entity, i.e., just like power, computation, or storage.

As detailed above, the computing devices and systems described and/orillustrated herein broadly represent any type or form of computingdevice or system capable of executing computer-readable instructions,such as those contained within the modules described herein. In theirmost basic configuration, these computing device(s) may each include atleast one memory device and at least one physical processor.

In some examples, the term “memory device” generally refers to any typeor form of volatile or non-volatile storage device or medium capable ofstoring data and/or computer-readable instructions. In one example, amemory device may store, load, and/or maintain one or more of themodules described herein. Examples of memory devices include, withoutlimitation, Random Access Memory (RAM), Read Only Memory (ROM), flashmemory, Hard Disk Drives (HDDs), Solid-State Drives (SSDs), optical diskdrives, caches, variations or combinations of one or more of the same,or any other suitable storage memory.

In some examples, the term “physical processor” generally refers to anytype or form of hardware-implemented processing unit capable ofinterpreting and/or executing computer-readable instructions. In oneexample, a physical processor may access and/or modify one or moremodules stored in the above-described memory device. Examples ofphysical processors include, without limitation, microprocessors,microcontrollers, Central Processing Units (CPUs), Field-ProgrammableGate Arrays (FPGAs) that implement softcore processors,Application-Specific Integrated Circuits (ASICs), portions of one ormore of the same, variations or combinations of one or more of the same,or any other suitable physical processor.

Although illustrated as separate elements, the modules described and/orillustrated herein may represent portions of a single module orapplication. In addition, in certain embodiments one or more of thesemodules may represent one or more software applications or programsthat, when executed by a computing device, may cause the computingdevice to perform one or more tasks. For example, one or more of themodules described and/or illustrated herein may represent modules storedand configured to run on one or more of the computing devices or systemsdescribed and/or illustrated herein. One or more of these modules mayalso represent all or portions of one or more special-purpose computersconfigured to perform one or more tasks.

In addition, one or more of the modules described herein may transformdata, physical devices, and/or representations of physical devices fromone form to another. Additionally, or alternatively, one or more of themodules recited herein may transform a processor, volatile memory,non-volatile memory, and/or any other portion of a physical computingdevice from one form to another by executing on the computing device,storing data on the computing device, and/or otherwise interacting withthe computing device.

In some embodiments, the term “computer-readable medium” generallyrefers to any form of device, carrier, or medium capable of storing orcarrying computer-readable instructions. Examples of computer-readablemedia include, without limitation, transmission-type media, such ascarrier waves, and non-transitory-type media, such as magnetic-storagemedia (e.g., hard disk drives, tape drives, and floppy disks),optical-storage media (e.g., Compact Disks (CDs), Digital Video Disks(DVDs), and BLU-RAY disks), electronic-storage media (e.g., solid-statedrives and flash media), and other distribution systems.

The process parameters and sequence of the steps described and/orillustrated herein are given by way of example only and can be varied asdesired. For example, while the steps illustrated and/or describedherein may be shown or discussed in a particular order, these steps donot necessarily need to be performed in the order illustrated ordiscussed. The various exemplary methods described and/or illustratedherein may also omit one or more of the steps described or illustratedherein or include additional steps in addition to those disclosed.

The preceding description has been provided to enable others skilled inthe art to best utilize various aspects of the exemplary embodimentsdisclosed herein. This exemplary description is not intended to beexhaustive or to be limited to any precise form disclosed. Manymodifications and variations are possible without departing from thespirit and scope of the present disclosure. The embodiments disclosedherein should be considered in all respects illustrative and notrestrictive. Reference should be made to the appended claims and theirequivalents in determining the scope of the present disclosure.

Unless otherwise noted, the terms “connected to” and “coupled to” (andtheir derivatives), as used in the specification and claims, are to beconstrued as permitting both direct and indirect (i.e., via otherelements or components) connection. In addition, the terms “a” or “an,”as used in the specification and claims, are to be construed as meaning“at least one of.” Finally, for ease of use, the terms “including” and“having” (and their derivatives), as used in the specification andclaims, are interchangeable with and have the same meaning as the word“comprising.”

As explained above, this application discloses technology that mayincrease the ability of network capacity upgrade planners to efficientlyplan upgrades to network capacity while additionally better handlinguncertainty in the forecasting of future network traffic growth. Thetechnology may generally achieve these benefits by applying a hose-basedapproach, rather than a pipe-based approach, to network planning, asdiscussed in more detail below. In particular, the technology mayfundamentally base the plan for the network capacity upgrade on aconstraint model that builds in uncertainty and, therefore, generates amultitude of different traffic matrices to be satisfied by the networkcapacity upgrade plan. The resulting plan will thereby be much moreresilient and effective than conventional plans that are based upon asingle network traffic matrix that was forecast to describe expectedgrowth in network traffic. Such single network traffic matrices may bebrittle and less effective at handling uncertainty.

EXAMPLE EMBODIMENTS

Example 1: A computer-implemented method for configuring networks mayinclude (i) generating a data center constraint model by placing aconstraint on a total amount of ingress or egress traffic a serviceexpects from each respective data center of multiple data centers, (ii)filtering a set of traffic matrices that indicate points in the datacenter constraint model by comparing the set of traffic matrices againstcut sets of a network topology that indicate network failures to createa tractable set of dominating traffic matrices, (iii) obtaining physicalnetwork resources to implement a cross-layer network upgradearchitecture that satisfies the tractable set of dominating trafficmatrices, and (iv) allocating the physical network resources across themultiple data centers according to the cross-layer network upgradearchitecture such that a capacity level of the multiple data centers isincreased while satisfying the data center constraint model.

Example 2: The method of Example 1, where the traffic matrices indicatea volume of traffic that is forecast between each pair of data centersin the multiple data centers.

Example 3: The method of any one or more of Examples 1-2, where thenetwork failures include at least two of singular fiber cuts, dualsubmarine link failures, and/or repeated failures.

Example 4: The method of any one or more of Examples 1-3, wherefiltering the set of traffic matrices is performed through integerlinear programming.

Example 5: The method of any one or more of Examples 1-4, where the datacenter constraint model indicates a convex polytope.

Example 6: The method of any one or more of Examples 1-5, where thephysical network resources include fiber and equipment.

Example 7: The method of any one or more of Examples 1-6, wherefiltering the set of traffic matrices selects traffic matrices that arepredicted to drive requirements for the physical network resources.

Example 8: The method of any one or more of Examples 1-7, where ensuringthat the cross-layer network upgrade architecture satisfies thetractable set of dominating traffic matrices guarantees, according to apredefined probability threshold, that the cross-layer network upgradearchitecture will also satisfy a remainder of the set of trafficmatrices.

Example 9: The method of any one or more of Examples 1-8, where a numberof traffic matrices in the tractable set of dominating traffic matricesis minimized.

Example 10: The method of any one or more of Examples 1-9, wherecomparing the set of traffic matrices against the cut sets of thenetwork topology that indicate network failures includes selecting a setof traffic matrices that generates a maximum level of traffic over eachgraph cut of the cut sets.

Example 11: The method of any one or more of Examples 1-10, wheregenerating the cross-layer network upgrade architecture includes sizinga link across each graph cut of the cut sets.

Example 12: The method of any one or more of Examples 1-11, where eachgraph cut of the cut sets forms a substantially straight line.

Example 13: The method of any one or more of Examples 1-12, where eachgraph cut of the cut sets forms a straight line.

Example 14: The method of any one or more of Examples 1-13, wheregenerating the cross-layer network upgrade architecture is performedaccording to a cost optimization model.

Example 15: The method of any one or more of Examples 1-14, where thecost optimization model generates the cross-layer network upgradearchitecture through integer linear programming.

Example 16: The method of any one or more of Examples 1-15, where thecost optimization model accounts for physical constraints in terms of atleast three of: fiber count, space and power, spectrum consumption onfiber, or maximum capacity constraints on leased waves.

Example 17: The method of any one or more of Examples 1-16, where thecost optimization model accounts for cost of at least three of: fiberprocurement, optical or Internet Protocol hardware, operational cost,space and power, or provisioning cost.

Example 18: The method of any one or more of Examples 1-17, where thecross-layer network upgrade architecture is generated according to ahose-based computation rather than a pipe-based computation.

Example 19: A corresponding system may include (i) a generation module,stored in memory, that generates a data center constraint model byplacing a constraint on a total amount of ingress or egress traffic aservice expects from each respective data center of multiple datacenters, (ii) a filtering module, stored in memory, that filters a setof traffic matrices that indicate points in the data center constraintmodel by comparing the set of traffic matrices against cut sets of anetwork topology that indicate network failures to create a tractableset of dominating traffic matrices, (iii) an obtaining module, stored inmemory, that facilitates the obtaining of physical network resources toimplement a cross-layer network upgrade architecture that satisfies thetractable set of dominating traffic matrices, (iv) an allocation module,stored in memory, that allocates the physical network resources acrossthe multiple data centers according to the cross-layer network upgradearchitecture such that a capacity level of the multiple data centers isincreased while satisfying the data center constraint model, and (v) atleast one physical processor configured to execute the generationmodule, the filtering module, the obtaining module, and the allocationmodule.

Example 20: A corresponding non-transitory computer-readable medium mayinclude one or more computer-readable instructions that, when executedby at least one processor of a computing device, cause the computingdevice to: (i) generate a data center constraint model by placing aconstraint on a total amount of ingress or egress traffic a serviceexpects from each respective data center of multiple data centers, (ii)filter a set of traffic matrices that indicate points in the data centerconstraint model by comparing the set of traffic matrices against cutsets of a network topology that indicate network failures to create atractable set of dominating traffic matrices, (iii) facilitate theobtaining of physical network resources to implement a cross-layernetwork upgrade architecture that satisfies the tractable set ofdominating traffic matrices, and (iv) allocate the physical networkresources across the multiple data centers according to the cross-layernetwork upgrade architecture such that a capacity level of the multipledata centers is increased while satisfying the data center constraintmodel.

As detailed above, the computing devices and systems described and/orillustrated herein broadly represent any type or form of computingdevice or system capable of executing computer-readable instructions,such as those contained within the modules described herein. In theirmost basic configuration, these computing device(s) may each include atleast one memory device and at least one physical processor.

In some examples, the term “memory device” generally refers to any typeor form of volatile or non-volatile storage device or medium capable ofstoring data and/or computer-readable instructions. In one example, amemory device may store, load, and/or maintain one or more of themodules described herein. Examples of memory devices include, withoutlimitation, Random Access Memory (RAM), Read Only Memory (ROM), flashmemory, Hard Disk Drives (HDDs), Solid-State Drives (SSDs), optical diskdrives, caches, variations or combinations of one or more of the same,or any other suitable storage memory.

In some examples, the term “physical processor” generally refers to anytype or form of hardware-implemented processing unit capable ofinterpreting and/or executing computer-readable instructions. In oneexample, a physical processor may access and/or modify one or moremodules stored in the above-described memory device. Examples ofphysical processors include, without limitation, microprocessors,microcontrollers, Central Processing Units (CPUs), Field-ProgrammableGate Arrays (FPGAs) that implement softcore processors,Application-Specific Integrated Circuits (ASICs), portions of one ormore of the same, variations or combinations of one or more of the same,or any other suitable physical processor.

Although illustrated as separate elements, the modules described and/orillustrated herein may represent portions of a single module orapplication. In addition, in certain embodiments one or more of thesemodules may represent one or more software applications or programsthat, when executed by a computing device, may cause the computingdevice to perform one or more tasks. For example, one or more of themodules described and/or illustrated herein may represent modules storedand configured to run on one or more of the computing devices or systemsdescribed and/or illustrated herein. One or more of these modules mayalso represent all or portions of one or more special-purpose computersconfigured to perform one or more tasks.

In addition, one or more of the modules described herein may transformdata, physical devices, and/or representations of physical devices fromone form to another. Additionally, or alternatively, one or more of themodules recited herein may transform a processor, volatile memory,non-volatile memory, and/or any other portion of a physical computingdevice from one form to another by executing on the computing device,storing data on the computing device, and/or otherwise interacting withthe computing device.

In some embodiments, the term “computer-readable medium” generallyrefers to any form of device, carrier, or medium capable of storing orcarrying computer-readable instructions. Examples of computer-readablemedia include, without limitation, transmission-type media, such ascarrier waves, and non-transitory-type media, such as magnetic-storagemedia (e.g., hard disk drives, tape drives, and floppy disks),optical-storage media (e.g., Compact Disks (CDs), Digital Video Disks(DVDs), and BLU-RAY disks), electronic-storage media (e.g., solid-statedrives and flash media), and other distribution systems.

The process parameters and sequence of the steps described and/orillustrated herein are given by way of example only and can be varied asdesired. For example, while the steps illustrated and/or describedherein may be shown or discussed in a particular order, these steps donot necessarily need to be performed in the order illustrated ordiscussed. The various exemplary methods described and/or illustratedherein may also omit one or more of the steps described or illustratedherein or include additional steps in addition to those disclosed.

The preceding description has been provided to enable others skilled inthe art to best utilize various aspects of the exemplary embodimentsdisclosed herein. This exemplary description is not intended to beexhaustive or to be limited to any precise form disclosed. Manymodifications and variations are possible without departing from thespirit and scope of the present disclosure. The embodiments disclosedherein should be considered in all respects illustrative and notrestrictive. Reference should be made to the appended claims and theirequivalents in determining the scope of the present disclosure.

Unless otherwise noted, the terms “connected to” and “coupled to” (andtheir derivatives), as used in the specification and claims, are to beconstrued as permitting both direct and indirect (i.e., via otherelements or components) connection. In addition, the terms “a” or “an,”as used in the specification and claims, are to be construed as meaning“at least one of.” Finally, for ease of use, the terms “including” and“having” (and their derivatives), as used in the specification andclaims, are interchangeable with and have the same meaning as the word“comprising.”

What is claimed is:
 1. A computer-implemented method comprising:generating a data center constraint model by placing a constraint on atotal amount of ingress or egress traffic a service expects from eachrespective data center of multiple data centers; filtering a set oftraffic matrices that indicate points in the data center constraintmodel by comparing the set of traffic matrices against cut sets of anetwork topology that indicate network failures to create a tractableset of dominating traffic matrices; obtaining physical network resourcesto implement a cross-layer network upgrade architecture that satisfiesthe tractable set of dominating traffic matrices; and allocating thephysical network resources across the multiple data centers according tothe cross-layer network upgrade architecture such that a capacity levelof the multiple data centers is increased while satisfying the datacenter constraint model.
 2. The computer-implemented method of claim 1,wherein the traffic matrices indicate a volume of traffic that isforecast between each pair of data centers in the multiple data centers.3. The computer-implemented method of claim 1, wherein the networkfailures comprise at least two of singular fiber cuts, dual submarinelink failures, and repeated failures.
 4. The computer-implemented methodof claim 1, wherein filtering the set of traffic matrices is performedthrough integer linear programming.
 5. The computer-implemented methodof claim 1, wherein the data center constraint model indicates a convexpolytope.
 6. The computer-implemented method of claim 1, wherein thephysical network resources comprise fiber and equipment.
 7. Thecomputer-implemented method of claim 1, wherein filtering the set oftraffic matrices selects traffic matrices that are predicted to driverequirements for the physical network resources.
 8. Thecomputer-implemented method of claim 1, wherein ensuring that thecross-layer network upgrade architecture satisfies the tractable set ofdominating traffic matrices guarantees, according to a predefinedprobability threshold, that the cross-layer network upgrade architecturewill also satisfy a remainder of the set of traffic matrices.
 9. Thecomputer-implemented method of claim 1, wherein a number of trafficmatrices in the tractable set of dominating traffic matrices isminimized.
 10. The computer-implemented method of claim 1, whereincomparing the set of traffic matrices against the cut sets of thenetwork topology that indicate network failures comprises selecting aset of traffic matrices that generates a maximum level of traffic overeach graph cut of the cut sets.
 11. The computer-implemented method ofclaim 1, wherein generating the cross-layer network upgrade architecturecomprises sizing a link across each graph cut of the cut sets.
 12. Thecomputer-implemented method of claim 11, wherein each graph cut of thecut sets forms a substantially straight line.
 13. Thecomputer-implemented method of claim 12, wherein each graph cut of thecut sets forms a straight line.
 14. The computer-implemented method ofclaim 1, wherein generating the cross-layer network upgrade architectureis performed according to a cost optimization model.
 15. Thecomputer-implemented method of claim 14, wherein the cost optimizationmodel generates the cross-layer network upgrade architecture throughinteger linear programming.
 16. The computer-implemented method of claim14, wherein the cost optimization model accounts for physicalconstraints in terms of at least three of: fiber count; space and power;spectrum consumption on fiber; or maximum capacity constraints on leasedwaves.
 17. The computer-implemented method of claim 14, wherein the costoptimization model accounts for cost of at least three of: fiberprocurement; optical or Internet Protocol hardware; operational cost;space and power; or provisioning cost.
 18. The computer implementedmethod of claim 1, wherein the cross-layer network upgrade architectureis generated according to a hose-based computation rather than apipe-based computation.
 19. A system comprising: a processor; and amemory comprising instructions that, when executed, cause the processorto: generate a data center constraint model by placing a constraint on atotal amount of ingress or egress traffic a service expects from eachrespective data center of multiple data centers; filter a set of trafficmatrices that indicate points in the data center constraint model bycomparing the set of traffic matrices against cut sets of a networktopology that indicate network failures to create a tractable set ofdominating traffic matrices; facilitate the obtaining of physicalnetwork resources to implement a cross-layer network upgradearchitecture that satisfies the tractable set of dominating trafficmatrices; and allocate the physical network resources across themultiple data centers according to the cross-layer network upgradearchitecture such that a capacity level of the multiple data centers isincreased while satisfying the data center constraint model.
 20. Anon-transitory computer-readable medium comprising instructions that,when executed, cause a computing device to: generate a data centerconstraint model by placing a constraint on a total amount of ingress oregress traffic a service expects from each respective data center ofmultiple data centers; filter a set of traffic matrices that indicatepoints in the data center constraint model by comparing the set oftraffic matrices against cut sets of a network topology that indicatenetwork failures to create a tractable set of dominating trafficmatrices; facilitate the obtaining of physical network resources toimplement a cross-layer network upgrade architecture that satisfies thetractable set of dominating traffic matrices; and allocate the physicalnetwork resources across the multiple data centers according to thecross-layer network upgrade architecture such that a capacity level ofthe multiple data centers is increased while satisfying the data centerconstraint model.